10 research outputs found
A Real-Time Gaze Estimation Framework for Mobile Devices
Tracking eyes becomes an important component to unleash new ways of human-machine interactions in augmented and virtual reality (AR/VR). To make the eye tracking system responsible, eye tracking systems need to operate at a real-time rate (\u3e 30Hz). However, from our experiments, modern gaze tracking algorithms operate at most 5 Hz on mobile processors. In this talk, we present a real-time eye tracking algorithm that operates at 30 Hz on a mobile processor. Our algorithm achieves sub-0.5° gaze accuracy, while requiring only 30K parameters, which is one to two orders of magnitude smaller than state-of-the-art algorithms
Recommended from our members
Specialization as a Candle in the Dark Silicon Regime
For decades computer architects have taken advantage of Moore's law to get bigger, faster, and more energy-efficient chips "for free," reaping the benefits of silicon process improvements and shrinking technology nodes. Each new technology node brought exponentially more transistors, balanced by exponentially lower transistor switching power, allowing the power budget for a fixed silicon area to remain relatively constant. Architects could count on more transistors---and use them to build more complex designs---without substantially increasing the total power budget for a chip.Today, however, rising CMOS leakage currents have limited further reductions in supply voltage, leading to a power-limited utilization wall and an end to classical Dennard scaling. This breakdown results in a new regime of dark silicon, in which vast swaths of silicon area must remain "dark" (powered down or under-clocked) most of the time. Architects must turn to novel approaches to squeeze ever more performance out of every last square-millimeter of silicon.This dissertation demonstrates that one viable approach to the dark silicon problem is specialization. Rather than relying solely on bigger, faster, general-purpose processors, chip architects have been increasingly augmenting their systems with special-purpose accelerators. These accelerators can speed up a given computation, allow it to run with less energy, or both. Using less energy frees up power and thermal budgets, allowing more computations to run in parallel and extending the computational capabilities we've come to demand from silicon.This dissertation presents two such specialized architectures. The first is GreenDroid, a mobile application processor built with custom accelerators targeting Android. The accelerators are energy-saving specialized circuits called conservation cores, or c-cores. In a 45-nm process, just 7-square-millimeters of silicon dedicated to c-cores covers approximately 95% of our Android workload. Powered by c-cores, GreenDroid uses 11x less energy on average than a general-purpose CPU.The second is Pixel Visual Core, a commercial accelerator from Google that enables energy-efficient computational photography and machine learning in the Pixel 2 and Pixel 3 smartphones. Pixel Visual Core is powered by an 8-core Image Processing Unit with 4,096 16-bit ALUs capable of performing 3.1 Tera-operations/second in under 5 watts. Compared to a 10-nm general-purpose application processor, the 28-nm Pixel Visual Core runs key compute kernels 3-6x faster and with 7-16x less energy
Recommended from our members
Specialization as a Candle in the Dark Silicon Regime
For decades computer architects have taken advantage of Moore's law to get bigger, faster, and more energy-efficient chips "for free," reaping the benefits of silicon process improvements and shrinking technology nodes. Each new technology node brought exponentially more transistors, balanced by exponentially lower transistor switching power, allowing the power budget for a fixed silicon area to remain relatively constant. Architects could count on more transistors---and use them to build more complex designs---without substantially increasing the total power budget for a chip.Today, however, rising CMOS leakage currents have limited further reductions in supply voltage, leading to a power-limited utilization wall and an end to classical Dennard scaling. This breakdown results in a new regime of dark silicon, in which vast swaths of silicon area must remain "dark" (powered down or under-clocked) most of the time. Architects must turn to novel approaches to squeeze ever more performance out of every last square-millimeter of silicon.This dissertation demonstrates that one viable approach to the dark silicon problem is specialization. Rather than relying solely on bigger, faster, general-purpose processors, chip architects have been increasingly augmenting their systems with special-purpose accelerators. These accelerators can speed up a given computation, allow it to run with less energy, or both. Using less energy frees up power and thermal budgets, allowing more computations to run in parallel and extending the computational capabilities we've come to demand from silicon.This dissertation presents two such specialized architectures. The first is GreenDroid, a mobile application processor built with custom accelerators targeting Android. The accelerators are energy-saving specialized circuits called conservation cores, or c-cores. In a 45-nm process, just 7-square-millimeters of silicon dedicated to c-cores covers approximately 95% of our Android workload. Powered by c-cores, GreenDroid uses 11x less energy on average than a general-purpose CPU.The second is Pixel Visual Core, a commercial accelerator from Google that enables energy-efficient computational photography and machine learning in the Pixel 2 and Pixel 3 smartphones. Pixel Visual Core is powered by an 8-core Image Processing Unit with 4,096 16-bit ALUs capable of performing 3.1 Tera-operations/second in under 5 watts. Compared to a 10-nm general-purpose application processor, the 28-nm Pixel Visual Core runs key compute kernels 3-6x faster and with 7-16x less energy
Efficient Complex Operators for Irregular Codes
Complex “fat operators ” are important contributors to the efficiency of specialized hardware. This paper introduces two new techniques for constructing efficient fat operators featuring up to dozens of operations with arbitrary and irregular data and memory dependencies. These techniques focus on minimizing critical path length and loaduse delay, which are key concerns for irregular computations. Selective Depipelining(SDP) is a pipelining technique that allows fat operators containing several, possibly dependent, memory operations. SDP allows memory requests to operate at a faster clock rate than the datapath, saving power in the datapath and improving memory performance. Cachelets are small, customized, distributed L0 caches embedded in the datapath to reduce load-use latency. We apply these techniques to Conservation Cores(ccores) to produce coprocessors that accelerate irregular code regions while still providing superior energy efficiency. On average, these enhanced c-cores reduce EDP by 2 × and area by 35 % relative to c-cores. They are up to 2.5 × faster than a general-purpose processor and reduce energy consumption by up to 8 × for a variety of irregular applications including several SPECINT benchmarks.
Qscores: Trading dark silicon for scalable energy efficiency with quasi-specific cores
Transistor density continues to increase exponentially, but power dissipation per transistor is improving only slightly with each generation of Moore’s law. Given the constant chip-level power budgets, this exponentially decreases the percentage of transistors that can switch at full frequency with each technology generation. Hence, while the transistor budget continues to increase exponentially, the power budget has become the dominant limiting factor in processor design. In this regime, utilizing transistors to design specialized cores that optimize energy-per-computation becomes an effective approach to improve system performance. To trade transistors for energy efficiency in a scalable manner, we propose Quasi-specific Cores, or QSCORES, specialized processors capable of executing multiple general-purpose computations while providing an order of magnitude more energy efficiency than a general-purpose processor. The QSCORE design flow is based on the insight that similar code patterns exist within and across applications. Our approach exploits these similar code patterns to ensure that a small set of specialized cores support a large number of commonly used computations. We evaluate QSCORE’s ability to target both a single application library (e.g., data structures) as well as a diverse workload consisting of applications selected from different domains (e.g., SPECINT, EEMBC, and Vision). Our results show that QSCORES can provide 18.4 ⇥ better energy efficiency than general-purpose processors while reducing the amount of specialized logic required to support the workload by up to 66%